Skip to content

chat : Avoid partial reasoning tags in response content #15149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

p1-0tr
Copy link
Contributor

@p1-0tr p1-0tr commented Aug 7, 2025

If a model uses a multi-part reasoning tag we can end up with part of the tag in the message content when using streaming mode. E.g.

$ curl -N http://localhost:8080/v1/chat/completions -d '{
  "model": "hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl",
  "messages": [
    {"role": "user", "content": "Hello, how are you?"}
  ],
"stream": true
}' -H "Content-Type: application/json"
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"role":"assistant","content":null}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":"<|channel|>"}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":"analysis"}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":"The"}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}

...

This happens because the chat parser can't make a full match on the first parts of the reasoning tag. So, modify try_consume_literal() to speculatively consume a partially matching string in case the parser is constructed with partial set to true.

Make sure to read the contributing guidelines before submitting a PR

@github-actions github-actions bot added the testing Everything test related label Aug 7, 2025
@p1-0tr p1-0tr force-pushed the ps-fix-reasoning-tags-in-content branch from 7e13319 to 4c64211 Compare August 7, 2025 11:09
If a model uses a multi-part reasoning tag we can end up with part of
the tag in the message content when using streaming mode. E.g.

 $ curl -N http://localhost:8080/v1/chat/completions -d '{
    "model": "hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ],
  "stream": true
  }' -H "Content-Type: application/json"
  data: {"choices":[{"finish_reason":null,"index":0,"delta":{"role":"assistant","content":null}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}

  data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":"<|channel|>"}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}

  data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":"analysis"}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}

  data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":"The"}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}

  data: {"choices":[{"finish_reason":null,"index":0,"delta":{}}],"created":1754562630,"id":"chatcmpl-bJReFN26YAf6IQXxNNSWT8Rk8q0NwfDk","model":"hf.co/unsloth/gpt-oss-20b-gguf:q6_k_xl","system_fingerprint":"b1-9515c61","object":"chat.completion.chunk"}

  ...

This happens because the chat parser can't make a full match on the
first parts of the reasoning tag. So, modify try_consume_literal() to
speculatively consume a partially matching string in case the parser is
constructed with partial set to true.

Signed-off-by: Piotr Stankiewicz <[email protected]>
@p1-0tr p1-0tr force-pushed the ps-fix-reasoning-tags-in-content branch from 4c64211 to 82bf586 Compare August 11, 2025 07:10
@p1-0tr
Copy link
Contributor Author

p1-0tr commented Aug 14, 2025

No longer needed with #15181

@p1-0tr p1-0tr closed this Aug 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant